FlexiFaCT: Scalable Flexible Factorization of Coupled Tensors on Hadoop
نویسندگان
چکیده
Given multiple data sets of relational data that share a number of dimensions, how can we efficiently decompose our data into the latent factors? Factorization of a single matrix or tensor has attracted much attention, as, e.g., in the Netflix challenge, with users rating movies. However, we often have additional, side, information, like, e.g., demographic data about the users, in the Netflix example above. Incorporating the additional information leads to the coupled factorization problem. So far, it has been solved for relatively small datasets. We provide a distributed, scalable method for decomposing matrices, tensors, and coupled data sets through stochastic gradient descent on a variety of objective functions. We offer the following contributions: (1) Versatility: Our algorithm can perform matrix, tensor, and coupled factorization, with flexible objective functions including the Frobenius norm, Frobenius norm with an `1 induced sparsity, and non-negative factorization. (2) Scalability: FlexiFaCT scales to unprecedented sizes in both the data and model, with up to billions of parameters. FlexiFaCT runs on standard Hadoop. (3) Convergence proofs showing that FlexiFaCT converges on the variety of objective functions, even with projections.
منابع مشابه
Bayesian Multi-view Tensor Factorization
We introduce a Bayesian extension of the tensor factorization problem to multiple coupled tensors. For a single tensor it reduces to standard PARAFAC-type Bayesian factorization, and for two tensors it is the first Bayesian Tensor Canonical Correlation Analysis method. It can also be seen to solve a tensorial extension of the recent Group Factor Analysis problem. The method decomposes the set o...
متن کاملScalable Probabilistic Tensor Factorization for Binary and Count Data
Tensor factorization methods provide a useful way to extract latent factors from complex multirelational data, and also for predicting missing data. Developing tensor factorization methods for massive tensors, especially when the data are binaryor count-valued (which is true of most real-world tensors), however, remains a challenge. We develop a scalable probabilistic tensor factorization frame...
متن کاملCost Effective and Scalable Synthesis of MnO2 Doped Graphene in a Carbon Fiber/PVA: Superior Nanocomposite for High Performance Flexible Supercapacitors
In the current study, we report new flexible, free standing and high performance electrodes for electrochemical supercapacitors developed througha scalable but simple and efficient approach. Highly porous structures based on carbon fiber and poly (vinyl alcohol) (PVA) were used as a pattern. The electrochemical performances of Carbon fiber/GO-MnO2/CNT supercapacitors were characteriz...
متن کاملScalable Tucker Factorization for Sparse Tensors - Algorithms and Discoveries
Given sparse multi-dimensional data (e.g., (user, movie, time; rating) for movie recommendations), how can we discover latent concepts/relations and predict missing values? Tucker factorization has been widely used to solve such problems with multi-dimensional data, which are modeled as tensors. However, most Tucker factorization algorithms regard and estimate missing entries as zeros, which tr...
متن کاملLiver CT Annotation via Generalized Coupled Tensor Factorization
This study deals with the missing answers prediction problem. We address this problem using coupled analysis of ImageCLEF2014 dataset by representing it as a heterogeneous data, i.e., dataset in the form of matrices. We propose to use an approach based on probabilistic interpretation of tensor factorization models, i.e., Generalized Coupled Tensor Factorization, which can simultaneously fit a l...
متن کامل